Local Learning for Mining Outlier Subgraphs from Network Datasets

نویسندگان

Manish Gupta

Arun Mallya

Subhro Roy

Jason H. D. Cho

Jiawei Han

چکیده

In the real world, various systems can be modeled using entity-relationship graphs. Given such a graph, one may be interested in identifying suspicious or anomalous subgraphs. Specifically, a user may want to identify suspicious subgraphs matching a query template. A subgraph can be defined as anomalous based on the connectivity structure within itself as well as with its neighborhood. For example for a co-authorship network, given a subgraph containing three authors, one expects all three authors to be say data mining authors. Also, one expects the neighborhood to mostly consist of data mining authors. But a 3-author clique of data mining authors with all theory authors in the neighborhood clearly seems interesting. Similarly, having one of the authors in the clique as a theory author when all other authors (both in the clique and neighborhood) are data mining authors, is also suspicious. Thus, existence of lowprobability links and absence of high-probability links can be a good indicator of subgraph outlierness. The probability of an edge can in turn be modeled based on the weighted similarity between the attribute values of the nodes linked by the edge. We claim that the attribute weights must be learned locally for accurate link existence probability computations. In this paper, we design a system that finds subgraph outliers given a graph and a query by modeling the problem as a linear optimization. Experimental results on several synthetic and real datasets show the effectiveness of the proposed approach in computing interesting outliers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Detection of Network-Level Outliers

The search for anomalies or outliers in large-scale network data has important security applications and can help to uncover interesting behavior. Four different types of anomalies can be discovered in static and dynamic network data: nodes, edges, small subgraphs, and/or larger (sub) networks. To date, most of the research in this area has focused on identifying anomalous nodes, links, or smal...

متن کامل

A Comparative Study of RNN for Outlier Detection in Data Mining

We have proposed replicator neural networks (RNNs) for outlier detection [8]. Here we compare RNN for outlier detection with three other methods using both publicly available statistical datasets (generally small) and data mining datasets (generally much larger and generally real data). The smaller datasets provide insights into the relative strengths and weaknesses of RNNs. The larger datasets...

متن کامل

A Meta analysis study of outlier detection methods in classification

An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism (Hawkins, 1980). Outlier detection has many applications, such as data cleaning, Fraud detection and network intrusion. The existence of outliers can indicate individuals or groups that have behavior very different to the most of the individuals of the...

متن کامل

On detection of outliers and their effect in supervised classification

An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism (Hawkins, 1980). Outlier detection has many applications, such as data cleaning, fraud detection and network intrusion. The existence of outliers can indicate individuals or groups that have behavior very different from the most of the individuals of t...

متن کامل

An empirical study of the effect of outliers on the misclassification error rate

An outlier is an observation that deviates so much from other observations that it seems to have been generated by a different mechanism. Outlier detection has many applications, such as data cleaning, fraud detection and network intrusion. The existence of outliers can indicate individuals or groups that exhibit a behavior that is very different from most of the individuals of the data set. Fr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Local Learning for Mining Outlier Subgraphs from Network Datasets

نویسندگان

چکیده

منابع مشابه

Statistical Detection of Network-Level Outliers

A Comparative Study of RNN for Outlier Detection in Data Mining

A Meta analysis study of outlier detection methods in classification

On detection of outliers and their effect in supervised classification

An empirical study of the effect of outliers on the misclassification error rate

عنوان ژورنال:

اشتراک گذاری